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The approximate computing is an alternative computing approach which can 
lead to high-performance implementation of audio and image processing as 
well as deep learning applications. However, most of the available 
approximate adders have been designed using application specific integrated 
circuits (ASICs), and they would not result in an efficient implementation on 
field programmable gate arrays (FPGAs). In this paper, we have designed a 
new approximate adder customized for efficient implementation on FPGAs, 
and then it has been used to build the Gaussian filter. The experimental 
results of the implementation of Gaussian filter based on the proposed 
approximate adder on a Virtex-7 FPGA, indicated that the resource 
utilization has decreased by 20-51%, and the designed filter delay based on 
the modified design methodology for building approximate adders for 
FPGA-based systems (MDeMAS) adder has improved 10-35%, due to the 
obtained output quality. 
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1. INTRODUCTION 

Extensive applications of small-scale embedded systems to high-performance computational 
systems can be implemented by using field programmable gate arrays (FPGAs). The advantages of FPGA are 
short time to market, high flexibility, and run-time configurability [1]. However, FPGA systems consume 
more power or energy compared to application specific integrated circuits (ASICs), despite the availability of 
hardware accelerators and special co-processors [2]. Hence, new FPGA-based energy efficiency optimization 
methods should also be developed and employed in addition to using conventional power reduction 
techniques. One of these new approaches is the approximate computations, which can simultaneously 
provide high performance and energy efficiency [3]. 

Approximate computations deal with the accuracy of intermediate or final computations in contrast 
to the delay, area, and power or energy consumption. This type of trade-off is very advantageous in 
applications, which are inherently resistant to faults [4]. Applications that are resistant to fault produce 
acceptable output, despite the reduced accuracy of computations. All applications such as image or video 
processing, data mining, and machine learning are resistant to fault, and therefore approximate computations 
can be used for them [5]. Approximate computational methods can be applied at different levels of 
computation, which extend from logic gates to compilers and programming languages [6]. A lot of software 
and hardware researches have been performed in the field of approximate computation [7]-[12]. Voltage 
over-scaling [13], [14] and functional approximation [15] are two main categories in hardware approximate 
computations. 
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Software approximate computational methods are divided into two main categories: i) loop perforation 
and function approximation [16], [17], and ii) programming language support [18], [19]. Most of the hardware 
approximate computation procedures focus on the basic computation modules like adders [20], [21] and 
multipliers [22], [23]. The full adder circuit is a basic unit in digital arithmetic and logic circuits [24]. 

Many studies have been carried out on approximate computations. most of them have been designed 
and implemented as ASIC. Since, there are structurally differences between ASIC and FPGA, the 
approximate computations approaches designed for ASIC cannot be designed and implemented directly for 
FPGA and achieve the same results of ASIC [25]. In this study, we intend to investigate approximate 
computations with respect to FPGA architecture, that the carry speculative adder (CSPA) and design 
methodology for building approximate adders for FPGA-based systems (DeMAS) adders have been used in 
its design. In section 2 we introduce existing approximate adders. In section 3, the proposed approximate 
adder will introduce. In section 4, implementation tests on proposed adder and comparison with other 
approximate adders will explored, and in section 5 we conclude our paper. 


2. EXISTING APPROXIMATE ADDERS 
2.1. Carry speculative adder (CSPA) 

A CSPA is indicated in Figure 1 [26]. This adder is similar to the carry save adder (CSA), with the 
difference that the CSPA includes one unit of sum generator, two units of internal carry generators 
(generating carries with 0 input, and generating carries with 1 input) and one unit of carry predictor in each 
block. The output of carry predictor of 1 block is used to choose the output of one of the two units of 
internal carry generator of (i+1)" block for the sum generator unit. In the carry predictor unit, k<x bit is only 
used, that k is bits of each block and x denotes bits that from each block used for predicting carry. 
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Figure 1. Structure of the CSPA adder [26] 


2.2. DeMAS adder 

Prabakaran et al. [27] introduces eight look up table (LUT)-based approximate adders called DeMAS 
for the Xilinx Virtex-7 Series FPGA, and the architectural features of the target FPGA device has been 
considered in their design. These adders are shown in Figure 2 (note that the adder-6 is 2 bits, and also has the 
highest accuracy among others). In this adder, the output value of truth table for LUT3 is "8E" and the output 
value of truth table for LUTS is "EO80FEF8". In this method, for designing an N-bit adder where these 
approximate adders are used for k least-significant bits (LSBs), and accurate adders are used for the (N-k) most 
significant bits (MSBs). 
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Figure 2. LUT-based approximate adders (a) adder-1 (b) adder-2 (c) adder-3 (d) adder-4 (e) adder-5 
(f) adder-6 (g) adder-7 and (h) adder-8 [27] 


3. METHODOLOGY 

The architecture of the proposed adder is shown in Figure 3. In this architecture, the adder is divided 
into x-bit blocks similar to CSPA. A carry predictor is used in each block, but instead of the gates related to 
internal adder of each block, a DeMAS adder renamed to MDeMAS has been used. The truth table indicates 
a precise 2-bit adder, DeMAS Adder-6 and the proposed adder in Figure 4. In the DeMAS collector, the high 
value bit of the first digit, Al, is considered as Cout. The high value bit of the sum, Sı, is generated by a 
LUTS with inputs A1A0B1BOCO and output value "EO80FEF8", and the low value bit, So, is generated by a 
LUT3 with input AOBOCO and output value "8E". 
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Figure 1. Architecture of the suggested adder using 2-bites blocks 


As shown in Table 1, in the DeMAS method for Adder-6, the highest magnitude of error is 2, the 
sum of magnitudes of errors is 28 and the error average is 0.875, which is relatively high. In the proposed 
adder, the following modifications have been made to DeMAS to reduce the approximation error: The carry 
predictor presented in [26] has been used for generating the Cout. The inputs of carry predictor are 
A1A0B1B0, and can be implemented with one LUT4. In the DeMAS method, the output carry is generated 
directly based on the input of Al, whereas just one carry has been used in the proposed method, therefore the 
accuracy has been increased. 

In the proposed method, we have generated a 2-bit sum using one LUT6-2. We generate the output 
of sum according to the generated carry by the carry predictor, so that there is a slight difference between the 
values of output sum and the precise sum. As illustrated in Table 1, the predicted output carry differs from 
the precise output carry in four states. In the cases where the predicted carry is equal to the exact output 
carry, we make the approximate sum equal to the sum, and in non-equal states, we set the approximate sum 
based on the carry, so that there is the least difference with the z precise sum. 
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Table 1. Truth table of exact, approximate DeMAS and proposed approximate MDeMAS adder 
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In the DeMAS method, k/2 of DeMAS adder-6 and a precise (N-k)-bit adder have been used. they 
generate the N-bit adder. This adder is shown in Figure 4. The main problem with this N-bit adder is the 
presence of the carry chain between blocks of the DeMAS adder-6 and inside the precise adder, which 
increases the delay. In the proposed adder, the following alterations were made to the DeMAS multi-bit adder 
to reduce the delay. 

As shown in Figure 4, we have replaced the DeMAS adder-6 with a carry predictor block and an 
MDeMAS adder, and removed the carry chain between blocks. Since the accuracy of proposed MDeMAS 
adder and predictor carry is higher than the DeMAS adder-6, the removal of carry chain does not reduce 
overall accuracy of the N-bit adder. In the N-bit DeMAS adder, a precise adder is employed to compute the 
high value (N-k)-bit of sum, which increases the total delay of the gate. In the proposed adder, we have 
removed this precise adder and increased the accuracy of generation of carry and the MDeMAS internal 
adder of each block. 
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Figure 2. Truth table of exact, approximate DeMAS and proposed approximate MDeMAS adder 
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4. RESULTS AND DISCUSSION 
4.1. Experimental setup 

To evaluate the proposed adder and approximate adders, we have used them in a 2D 3x3 Gaussian 
convolution filter. The architecture of this filter is indicated in Figure 5, which is used to blur the images. The 
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employed Gaussian kernel has the mean distribution of 0 and o=1. The CSPA [26], DeMAS [27] and the 
proposed MDeMAS adders have been designed and implemented using the very high-speed integrated circuit 
(VHSIC) hardware description language (VHDL) language and in the Xilinx ISE environment. Using each of 
these adders, we have designed a Gaussian filter based on architecture [28] and implemented and synthesized 
them to evaluate and compare the area and delay on the Virtex-7 family device 7VX330T. MATLAB has 
been used to analyze the accuracy and quality of the designed filter outputs. 


4.2. Accuracy result 

The output quality of Gaussian filters in which different approximate adders are used is shown by 
the peak signal to noise ratio (PSNR) and structural index similarity (SSIM) criteria in Table 2. The Lena 
image with 512x512 dimensions has been used as the filter input. Approximate adders with different 
configurations have been employed in the Gaussian filter. The value of BlockSize represents the size of 
blocks in CSPA and MDeMAS adders. The MDeMAS adder is made with only the size of block 2 for 
optimal use of FPGA Virtex-7 resources. The value of approxBits in the DeMAS adder represents the 
number of low value bits of sum obtained by approximation. In the filter made based on DeMAS, the high 
approxBits the lower the output quality, but the delay will improve. The output quality of the filter made on 
the basis of the proposed MDeMAS adder possesses an accuracy equivalent to the filter made on the basis of 
the DeMAS with only 2-bit approximation. 


OUT 


Figure 5. Gaussian filter convolution [28] 


Table 2. Results of Gaussian filter circuits quality based on different approximate adders 

Gaussian filter PSNR _ SSIM 

2DGSF-CSPA BlockSize=2 21.45 0.44 

BlockSize=4 22.33 0.47 

2DGSF-DeMAS approxBits=2 22.96 0.45 

approxBits=4 21.85 0.42 

approxBits=6 18.88 0.36 

2DGSF-MDeMAS BlockSize=2 22.79 0.44 


4.3. Delay and area results 

The CSPA, DeMAS adders and the proposed adder have been used for making the Gaussian filter 
shown in Figure 5. the designed filters have been synthesized on Virtex-7 family device 7VX330T. Results 
have been listed in Table 3. As determined from the results, the gate of designed filter has the lowest area by 
using the approximate MDeMAS adder. Its delay is in the range of a filter, which is based on the DeMAS 
with approximate number of 6 bits with the lowest output quality. The input pixels of filters are grey, and 
therefore the input values of adders are 8-bit. 
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Table 3. Synthesis results of constructed Gaussian filter based on different approximate adders 


Gaussian filter Area (LUT) Delay(ns) _ AreaxDelay 
2DGSF-CSPA BlockSize=2 61 4.59 279.99 
BlockSize=4 61 4.992 304.512 
2DGSF-DeMAS approxBits=2 112 7.613 852.656 
approxBits=4 96 6.017 577.632 
approxBits=6 82 4.421 362.522 
2DGSF-MDeMAS BlockSize=2 51 4.89 249.39 


As shown in Figure 6, the delay of the filter made with the approximate DeMAS adder with a 6-bit 
approximation compared with filter delays designed with our proposed adder is less. This is because DeMAS 
uses 6 bits of approximation, therefore produces an inappropriate image quality. According to the results of 
the output quality, in Gaussian filter design based on the proposed method, we can simultaneously reduce both 
the delay and the output quality, and reduce the utilization of resources considerably as shown in Figure 7. 
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Product of area and delay is a measure that must have a trade-off, since to reduce the delay, we need 
more area. Therefore, for optimizing these two measures, product of them must be minimum. Figure 8, 
shows the proposed method is superior to other methods in terms of this criterion. We could simultaneously 
reduce the delay and output quality. Moreover, reduce the rate of utilization of resources, considerably. 
According to the results of output quality of Table 2 and the results of synthesis of Table 3 in the Gaussian 
filter designed based on the proposed adder,we have such reductions. 
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Figure 8. Bar chart of product of delay and consumed amount of constructed Gaussian filters of the 
constructed filters using different approximate adders 
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5. CONCLUSION 

In this study, we have designed a new approximate adder to implement on the FPGA, that the CSPA 
and DeMAS adders have been used in its design. The proposed approximate adder was used to make the 
Gaussian filter and then implemented on the FPGA Virtex-7. The results demonstrated that the resource 
utilization has decreased by 20-51%, and the delay of designed filter based on MDeMAS adder has been 
improved by 10-35% due to the obtained output quality. 
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